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1.  INTRODUCTION 


In  the  earlier  paper  by  the  authors,  Kotz  and  Shanbhag  (1980)  (to 
be  referred  to  as  KSh  for  brevity)  presented  a  detailed  discussion  of 
new  approaches  to  univariate  probability  distributions.  We  concentrated 
on  representations  and  characterizations  of  probability  distribution 
functions  in  terms  of  conditional  expectations  (specifically  in  terms  of 
the  expected  remaining  life  function  -  e.r.l.  function)  and  in  terms  of 
hazard  measures. 

In  the  course  of  our  investigations,  we  succeeded  in  extending, 
generalizing  and  simplifying  a  number  of  results  dealing  with  e.r.l. 
functions  and  hazard  measures  which  have  appeared  in  the  literature  of 
the  last  two  decades.  We  also  presented  some  convergence  theorems  which 
shed  light  on  the  structure  of  e.r.l.  functions,  hazard  measures  and 
distribution  functions  in  both  the  continuous  and  discrete  cases  (but 
not  restricted  to  these  cases  only). 

In  many  instances  of  practical  applications,  requiring  model  build¬ 
ing,  there  are  indications  of  such  results  being  of  special  potential 
importance . 

The  present  paper  is  structured  along  the  lines  of  KSh  (1980)  but 
is  an  initial  attempt  towards  studying  more  subtle  and  difficult  prob¬ 
lems  of  multivariate  distributions.  In  this  paper,  we  shall  attempt 
to  unify,  extend,  generalize  and  simplify  results  scattered  in  the 
literature  related  to  structures  of  multivariate  distributions  (in 
particular  but  not  exclusively  of  a  non-absol utely  continuous  nature), 
of  various  definitions  of  hazard  measures.  (Unlike  the  univariate  case 
there  is  no  unique  definition  of  this  concept  in  the  multivariate  case 
in  the  literature.)  Among  other  results,  an  over-compassing  generali¬ 
zation  of  the  scalar  multivariate  hazard  measure  is  given  and  an  overall 


structure  as  well  as  certain  convexity  properties  and  their  implications 
related  to  this  measure  are  revealed.  In  addition,  we  define  and  inves¬ 
tigate  multivariate  analogues  and  extensions  of  e.r.l.  functions  and 
trace  their  relations,  first  to  the  multivariate  probability  distribu¬ 
tion  functions  and  then  to  the  correspondi ng  univariate  concept  on  the 
one  hand,  as  well  as  to  (various  generalizations  of)  multivariate  hazard 
measures  on  the  other.  Following  the  approach  adopted  in  KSh  (1980)  for 
the  univariate  case,  we  do  not  restrict  ourselves  necessarily  to  non¬ 
negative  random  variables.  (The  notions  of  the  hazard  measure  as  well 
as  that  of  the  e.r.l.  functions  in  the  literature  are  usually  limited 
to  the  non-negative  case.) 

Most  of  the  groundwork  as  far  as  the  convergence  and  representation 
theorems  is  concerned  has  been  laid  in  KSh  (1980).  However,  in  the  pre¬ 
sent  paper  we  clarify,  using  examples  of  specific  distributions,  some 
ambiguities  and  certain  inconsistencies  related  to  the  structure  of 
various  characteristics  of  multivariate  distributions  in  our  search  for 
the  most  meaningful  and  practically  attractive  expressions  and  repre¬ 
sentations  of  these  distributions  which  would  expose  the  hidden  depen¬ 
dencies  among  jointly  distributed  random  variables.  These  findings  could 
prove  to  be  of  some  significance  in  future  developments  at  least  in  areas 
such  as  reliability  and  pattern  recognition. 

2.  A  GENERALIZED  MULTIVARIATE  HAZARD  GRADIENT  AND  A 
MULTIVARIATE  GENERALIZATION  OF  THE  e.r.l.  FUNCTION 

In  this  section,  we  shall  give,  among  other  things,  two  theorems 
that  follow  as  direct  corollaries  of  KSh  (1980).  These  concern  respec¬ 
tively  a  generalized  multivariate  hazard  gradient  and  an  analogous 
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multivariate  generalization  of  the  e.r.l.  function. 

For  multivariate  distributions,  there  exist  in  the  literature 
basically  two  approaches  to  defining  hazard  functions,  both  confined 
predominantly  to  absolutely  continuous  distributions  on  Euclidean  spaces. 

The  first  definition,  adopted  and  analyzed  by,  among  others,  Basu 
(1971)  and  Puri  and  Rubin  (1973),  is  a  straightforward  extension  of  the 
univariate  concept.  (A  purely  discrete  case  was  also  considered  by  Puri 
and  Rubin  (1  973).)  The  hazard  function  of  a  random  vector  X  =  (X,  ,...,X  ) 

1  p 

is  defined  in  this  case  to  be  a  real-valued  function  r  on  (x:  F(x)  >  0} 
with  values 

r(x)  =  f(x)/F(x), 

where  x  =  (x^,...,x  )  e  Rp,  f(x)  is  the  probability  density  function,  and 
F(x)  is  the  survivor  function  given  by 

F(x)  =  P(X  >  x) . 

(Here  as  well  as  in  what  follows  the  inequalities  for  vectors  are  to  be 
understood  componentwise.)  This  concept  was  further  discussed  by  Block 
(1977)  where  additional  closely  related  variants  were  proposed,  and 
treated  in  a  somewhat  more  unified  manner  in  Galambos  and  Kotz  (1978). 

We  intend  to  generalize  this  definition  and  examine  it  in  greater  detail. 
However,  since  our  contribution  in  this  case  is  to  be  rather  substantial 
without  relying  very  heavily  on  KSh  (1980),  we  shall  deal  with  it  sepa¬ 
rately  in  the  next  section  (i.e..  Section  3  of  the  paper). 

The  second  approach,  due  to  Johnson  and  Kotz  (1975a)  and  Marshall 
(1975),  defines  a  multivariate  hazard  gradient  (in  an  absolutely  con¬ 
tinuous  case)  as  the  vector-valued  function  h  on  (x:  F(x)  >  0}  with  values 

h(x)  =  (-  )lcg  F(x) 

„  Ja,  o  a 

1  p 

=  -grad  log  F(x) 


(except  for  a  set  of  Lebesgue  measure  zero).  As  was  shown  by  Marshall 
(1975)  in  the  absolutely  continuous  case,  the  vector-valued  h  uniquely 
determines  the  probability  distribution  function  (d.f.)  or  equivalently 
the  survivor  function.  Note  that  each  one  of  the  components  of  h(x) 
depends  in  general  on  all  the  variables  x . (i  =  l,2,...,p).  In  the  first 
part  of  this  section  (i.e.  in  part  a)  we  shall  generalize  the  gradient 
h  to  the  case  of  arbitrary  d.f.'s  and  at  the  same  time  reduce  some  re¬ 
dundancy  existing  in  the  structure  of  the  components  of  this  gradient. 
The  main  result  involving  a  representation  given  in  this  part  subsumes 
Marshall's  (1975)  result  and  is  essentially  a  corollary  of  Propositions 
5  and  8  of  KSh  (1980)  . 

In  KSh  (1980)  -  motivated  by  the  remark  contained  in  Shanbhag 
(1970)  and  the  results  of  Hamdan  (1972),  Kotlarski  (1972),  Shanbhag 
and  Bhaskara  Rao  (1975)  and  Gupta  (1975)  -  we  also  extended  the  con¬ 
cept  of  the  e.r.l  .  function  of  a  positive  random  variable  to  an  arbi¬ 
trary  random  variable  and  have  given  a  representation  for  a  probability 
distribution  in  terms  of  this  function.  Some  possibilities  of  the 
applicability  of  the  concept  in  practice  have  been  indicated  in  KSh 
(1980)  and  the  references  cited  above.  (Also,  see  Hall  and  Wellner 
(1981),  Hollander  and  Proschan  (1984)  and  Glanzel  et  al  (1984)  for 
further  information  and  references  on  the  e.r.l.  function.)  A  variety 
of  multivariate  generalizations  of  this  function  can  of  course  be  con¬ 
structed.  However,  we  intend  in  this  case  to  deal  only  with  a  certain 
construction  that  has  features  closely  resembling  those  of  the  multi¬ 
variate  hazard  function  of  the  present  section.  The  representation 
theorem  in  this  latter  case  follows  as  a  corollary  of  KSh  (1980).  In 
view  of  the  prevailing  analogy,  we  shall  devote  the  second  part  of  this 


section  (i.e.  part  b)  to  discussing  this  particular  version  of  e.r.l. 
functions  and  revealing  some  of  its  properties  including  the  aformentioned 
theorem.  For  a  related  but  independently  carried  out  investigation  of 
multivariate  analogues  of  e.r.l.  functions,  the  reader  may  wish  to 
consult  Zahedi  (1985).  This  work  is  however  along  different  lines. 


a .  A  generalized  hazard  gradient  and  some  of  its  basic  properties. 

Let  p  >_  2,  F  be  a  d.f.  on  R^  and  X  =  ( ,X^ , . . .  ,X^)  be  a  random  vector 


.  ( 1 )  j 


distributed  according  to  this  d.f.  Let  Vp  (-jx^.+^)  with  x^  = 
(xi  ,xi +1  *  •  •  •  ,xp) »  x(i)  =  x  denote  the  hazard  measure  on  for  the 
conditional  distribution  of  X.  given  that  X_.+1  >_  x.+.j . ^  >  xp 


(as  stipulated  in  Section  4  of  KSh  (1980))  for  every  e  R 


P-i 


and 


i  =  l,2,...,p-l.  (We  define  the  conditional  distribution  to  be  arbitrary 
for  any  conditioning  set  of  measure  zero.)  Also,  let  vj.^(*)  denote 


1 


the  corresponding  hazard  measure  on  R  for  the  marginal  distribution 


of  Xp.  Extending  and  modifying  the  definition  of  Johnson  and  Kotz  (1975a) 


and  Marshall  (1975),  we  call  the  family 

{vFl)(’l*(i+i)):*(i+n  6  RP_i*  1  =  4P)(,) 

the  hazard  gradient  relative  to  the  d.f.F.  We  have  the  following  theorem 
which  is  essentially  a  corollary  of  Propositions  5  and  8  of  KSh  (1980) 
(see,  also,  Cox  (1972) ) : 


THEOREM  1.  The  survivor  function  corresponding  to  F  is  represented  by 


F( x )  =  P(X>x)  =  n  {[  n  n -v p1 ^ ( f y . } | x { . +] , ) } ] 
'  ~  i  =1  y.-eD,.(x,.J 


'rur:(i) 


X  exp[-VpC>1 ^((-”,xi ]|X(1+1 j) J) ,  X  e  Ry 


(2.1) 


and  for  a  continuous  F  the  representation  is 


F( x )  =  exp{-  i  v['  ^  ( ( — 00 »x .  ]  | : 


(i+1) 


)} ,  x  e  Rp, 


(2.2) 


where  the  notation  p  ( *  I  x  (  p+] ) )  1S  used  for  convenience  to  denote 


Vp  (•)»  e  °°  is  defined  to  be  zero,  D^(x^.j)  is  the  set  of  real  points 
y.  <  x.  at  which  ^  ( {yi }  |  x^ .  +1  ^ )  is  positive,  and  v^.c  ^  ( •  |  x^ .  +] , )  the 
continuous  (non-atomic)  part  of  vFl)('^(i+l))  Furthermore,  if  F  is 
continuous  and  { F^ :  n=l,2,...,  }  is  a  sequence  of  d.f.'s  on  Rp,  then 
using  the  same  notation 


v^)((-~,xi]|X(i+1))  -  Vpl}((  — ,x.]|x(i+1)) 


(2.3) 


for  each  x  such  that  F(x)  >  0  and  i  =  l,2,...,p  if  and  only  if  {F^} 
converges  to  F. 


Proof.  (2.1)  and,  if  F  is  continuous,  (2.2)  follow  immediately  from 
Proposition  5  of  KSn  (1980;  in  view  of  the  relation 

P-1 

P(X  >  x)  =  p(xp  >  xp)_n  P(X.  >  Xl|xi+1  >  Xi+1,...,xp  >  Xp), 


x  fc  Rk. 


(2.4) 


If  F  is  continuous,  then  the  marginal  distribution  function  of  Xp  is 
continuous  and  for  every  x  such  that  F(x)  >  0  and  i  =  l,2,...,p-l,  the 
conditional  distribution  of  X.  given  X^  >_  x.+^,...,X  >  xp  is  continuous 
Also,  if  X(n)  =  ( xjn ^  , .  .  .  ,Xpn ^ )  for  each  n  >  1  is  a  random  vector  dis¬ 
tributed  according  to  F  ,  then  for  each  n  >_  1 

n 

P(x(n)  .  X)  .  P(x'n)  >  *)Pn’p(*Sn)  »  >  xi+, . »'">  »  »  ) 


x  e  RP . 


(2.5) 


Applying  Proposition  8  of  KSh  (1980)  to  the  survivor  functions  on  the 
r.h.s.  of  (2.5),  it  can  be  easily  verified  that  the  convergence  part 
of  the  theorem  is  valid. 

Remark  1. 

For  absolutely  continuous  distributions,  representation  (2.2) 
reduces  to  that  of  Marshall  (1975).  Both  (2.2)  and  (2.1)  are  thus 
extensions  of  Marshall's  hazard  gradient  representation,  moreover, 
the  general  representation  for  purely  discrete  distributions  follows 
from  (2.1)  in  the  obvious  manner. 

Remark  2. 

The  "convergence"  part  of  Theorem  1  fails  to  be  valid  if  the 
assumption  of  continuity  of  F  is  omitted.  Examples  1-3  presented  in 
KSh  (1980)  following  Proposition  8  in  Section  4  are  sufficient  to 
illustrate  this  situation. 

Remark  3. 

The  hazard  gradient  obviously  has  other  versions  when  the  ordering 
of  the  variables  is  altered.  Under  a  specific  situation,  one  may  find 
a  particular  version  to  be  the  most  natural  and  easiest  to  handle.  In 
that  case,  we  shall  of  course  consider  the  corresponding  ordering  to  be 
the  one  implied  in  our  Theorem  1.  A  similar  remark  applies  to  the  result 
of  Theorem  2  below. 

Remark  J, 

The  following  observation  related  to  univariate  hazard  measures 
may  be  appropriate  at  this  ooint.  (See  also  the  beginning  of  Section  4 
of  KSh  (1980).)  If  G  is  a  d.f.  on  ,  then  according  to  representation 
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(4.1)  in  KSh  (1980)  either 

x“eD{1"yG({xr}))  =  0  or  Hc(oo)  = 

where  is  the  hazard  measure  corresponding  to  G,  D  is  the  set  of  dis¬ 
continuities  of  Vg  and  Hc(x)  =  v qC ^ ( ( -” »x ] ) »  v qC ^  beinq  the  continuous 
part  of  Vg.  Whenever  the  right  extremity  of  G  is  not  one  of  its  dis¬ 
continuity  points,  we  have  vG((xr))  <  1  tor  all  x^  e  D.  Now  the  Borel 
zero-one  law  and  relation  (16)  given  in  Burrill  (1972),  p.245,  imply 

that  x'e  D^"vG^Xr^  =  ^  1  T  and  only  if  x^60vG^xr^)  =  ”  provided 
r  xr  r 

vG^Xr^  <  1»  xr  e  0.  This  leads  us  to  the  relation 

vr( (-«,“))  =  l  vr(fx  })  +  H  (”)  =  »  (2.6) 

G  x  eD  G  r  C 

r 

whenever  the  right  extremity  of  G  is  not  one  of  its  discontinuity 
points.  (This  result  was  obtained  earlier  by  Shanbhag  (1979)  using 
a  somewhat  different  argument.) 

Remark  5. 

As  a  corollary  of  Theorem  1,  it  follows  that  the  components  of  X 
are  independent  if  and  only  if  there  exists  a  version  of  the  hazard 
gradient  of  F  such  that  Vp1  ^  ( •  |  x^ . )  is  independent  of  x^.^  Tor 
each  i  =  l,2,...,p-l.  The  theorem  also  yields  several  other  interesting 
corollaries.  In  particular,  since  the  theorem  also  implies  that  every 
distribution  on  R13  is  characterized  by  its  hazard  gradient,  one  could 
obviously  use  it  to  give  further  characterizations  of  distributions, 
such  as  the  Marshal  1 -01  kin  bivariate  distribution  or  Freshet's  multi¬ 
variate  distribution  with  continuous  marginals  or  a  multivariate  Pareto 
distribution,  for  which  the  hazard  gradients  are  of  a  particularly 
appealing  form. 


The  generalized  e.r.l.  function  and  some  relevant  comments 


In  view  of  Proposition  3  of  KSh  (1980),  (2.4)  in  the  proof  of 
Theorem  1  above  implies  that  under  some  mild  assumptions  there  exists 
a  representation  for  the  survivor  function  of  every  p-component  random 
vector  X  =  (X^,...,X  )  in  terms  of  the  conditional  expectations 
E{h. (X. )  |  X.  >  x.,...,X„  >  x  },  (x.  , . . .  ,x J  e  RP  ^  of  monotone  trans- 


i  i '  1  i  -  i 


P  -  P  i 


forms  h. ,  i  =  l,2,...,p.  This  is  given  by  the  following  Theorem  2. 

The  theorem  yields,  among  other  things,  that  if  X  is  a  random  vector 
with  E { X ^ ®  for  all  i  =  l,2,...,p  (where  xt  =  max(0,X.}),  then  the 
condi  tional  expectations  E{X  .-x  .  |  X.  >_  x  .....  ,X  >_  x  }  ,  i  =  1 ,2 , . . .  ,p, 
x(  =  (x1 , . . .  ,x  ) )  e  Rp  (and  hence  E{ X-x [ X  >  x},  x  e  Rp)  characterize  the 
distribution  of  X;  the  representation  in  this  latter  case  is  aiso  ob¬ 
vious  now.  Since  the  fami  ly  of  expectations  { E{  X. -x^.  |  X..  >_  x^ , . . .  ,X  >_  xd>  : 
i  =  l,2,...,p,  x  e  Rp)  avoids  some  of  the  redundancies  existing  in  the 
function  E  { X -x  |  X  >  x},  x  €  RP  and  has  all  the  obvious  requirements  of 
an  e.r.l.  function,  it  would  be  reasonable  to  adopt  it  to  be  the  e.r.l. 
function  of  a  multivariate  probability  distribution  on  RP. 

THEOREM  2.  (A  representation  theorem).  Let  X  =  (X^,...,X  )  be  a  random 
vector  with  p  components  and  h.  ,  i  =  l,2,...,p  be  real-valued  non-decreasing 
functions  on  the  real  line  such  that  Efh|(X.)}  <  ®  for  all  i  =  l,2,...,p 
(where  h|(X^)  =  max{0  ,h  ( X^ ) } ) .  If  h. ,  i  =  l,2,...,p,  are  such  that 
h.(x.)  <  Efh.(X.i|X.  >  x.,  X.  ,  >  x.,,,...,X  >  x  1  whenever  P{X.  >  x., 

I  V  1  1  1  i  i  1  i +1  -  i  +1  ’  p  -  p  1  1 

X .  >_  x.+.j,...,Xp  >  Xp)  >  0,  then  the  survivor  function  corresponding 

to  X  is  given  by 


P  ( X  >_  x)  =  G(x),  x(  =  (x^,...,x))  e  Rp  , 


(2.7) 
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where  G  is  the  left  continuous  function  satisfying 


0  if  x.  >  b.  for  some  j  >  1  &  <  p 
1  J  J  -  ~  p 


*•  ju(c), 


G(x)  = ■ 


p  ,  gi(yi ^(i+i))  *  dhr;(z) 


\i=l 


9it;(i)) 


zeD 


VN 


yj9i (z’-(i+l 


} 


if  x .  <  b ;  for  all  j  >  1  &  <  p 
JO 


(2.8) 


in  which  D 


(i) 


yi  *x 

[y .  ,x  . )  ,  h^ 

’  1  '  ’  l 

uous  version). 


denotes  the  set  of  discontinuity  points  of  h.  in 
i 

denotes  the  continuous  part  of  h.  (i.e.  of  its  right  contin- 

x(,)  ■  (xf--'xp>* 


9i<2(i)>  ■  E(|>, (*,)!*(,)  i  ;(i)}  - 


gi!z'x-(i+l)> 


gi  ^z'x(i+l ))  ~  (hi|z)  ~  N*2-’)] 
9i(z+,!!(i+l))  +  (hilz+)  '  hj'z))J 


9i(z+'x-((+l)> 

9i(z>x-(i+l)) 


(2.9) 


if  {y:  lim  E(h,  ( X, )  |  Xj  ( j  >  x^)}  exists  and  <  hf  (y ) )  ,s  emPtl' 


xi+y 


and  b.  = 


i n f { y :  1 im  E{ h. ( X. ) | X( . j  >  x( . j }  exists  and  <  h.  (y) }  otherwise 


x.\y 


with  X / . \  =  ( X .  , . . .  ,X_) . 
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■ 

it 


(The  conditional  expectations  are  defined  arbitrarily  when  the  condi¬ 
tioning  sets  are  of  measure  zero;  also  (2.8)  and  (2.9)  in  the  statement 
above  are  to  be  read  without  x^. in  the  case  of  i  =  p.) 

Remark  6. 

In  view  of  Theorem  3  and  the  information  given  in  the  Remarks  in 
Section  3  of  KSh  (1980),  it  is  possible  to  present  several  extensions 
and  variants  of  Theorem  2  given  above. 

Remark  7. 

If  h. 's  in  Theorem  2  are  assumed  additionally  to  be  continuous, 

then  thp  representation  (2.7)  with  G.  's  given  by  (2.8)  without  the  term 
*  ( c ) 

{  n  g.(z,X/.,»)}  and  with  h;  y's  replaced  by  h.'s  is  valid. 

(  ;  \  l  -  v  i + 1 )  i  1 

zeDu  ' 


D\ 


i 

& 


Remark  8. 

If  h.(i  =  l,2,...,p)  of  Theorem  2  are  taken  as  strictly  increasing, 
the  representation  (2.7)  for  a  survivor  function  is  obviously  valid  in 
the  case  of  every  distribution  satisfying  the  integrability  condition  of 
the  theorem.  One  may  be  interested  in  seeing  whether  there  exists  a 
representation  for  the  survivor  function  for  X  in  terms  of  the  conditional 
expectations  corresponding  to  a  fewer  number  of  functions,  which  are 
appealing  in  some  sense,  at  least  when  the  domains  of  the  definition  of 
h .  are  taken  as  Euclidean  spaces  with  h^(X^)  considered  above  replaced 
by  h.(X^),  x<’>  being  a  subvector  of  X.  However,  it  is  not  difficult 
to  see  that  in  general  merely  with  the  integrability  condition  such  a 
representation  does  not  exist.  This  could  be  verified  by  noting,  for 
example,  that  if  h.,  i  =  l,2,...,p-l  are  given  to  be  real-valued  Borel 


w 
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measurable  functions  on  Rp,  then  there  exist  random  vectors  X  and  Y 
with  distinct  distributions  having  a  common  support  (such  as  {(0,...,0), 
(1,0,. ..,0),  ....  (0 , . . .  ,0,1 )} )  such  that 

E { h i  ( X ) | X  >  x}  =  E { h .  ( Y) I Y  >  x}  for  all  x  e  Rp  and  i  =  1,2 . p-1 . 


Remark  9, 


Prakasa  Rao  (1974)  has  essentially  attempted  to  solve  under  some 
constraints  the  problem  mentioned  in  Remark  8.  He  has  given  in  this 
context  a  uniqueness  theorem  in  the  bivariate  case  under  certain  assump¬ 
tions.  The  following  example  shows  that  the  theorem  is  not  valid. 


EXAMPLE  1.  Define  h  to  be  a  real-valued  function  on  R  such  that 


h ( x  ,y )  =  (l-e'x  H(y) ,  x,y  e  R  , 


where 


5(y)  =  <  c  +  (x-2) 


+  (3-y)~ 


c  +  2  + 
c  +  2 


if  y  <  1 
if  1  <  y  <  2 


i  f  2  <  y  <  4 


i  f  4  <  y  £  5 
if  y  >  5, 


where  c  is  a  positive  number.  Alternatively,  one  could  consider  the  h 
with  a  slightly  more  trivial  situation  of  S  -  c  for  c  f  0.  Let  (X,Y) 
and  (Z,W)  be  random  vectors  with  absolutely  continuous  independent  non¬ 
negative  components  such  that  X  and  Z  are  identically  distributed  but 
the  distributions  of  Y  and  W  are  not  identical.  Also  assume  the  random 
vectors  to  be  such  that  their  marginal  distributions  have  all  left  ex¬ 
tremities  to  be  equal  to  zero  and 

P(Y  <  1)  =  P(W  <  1),  P(Y  <  y ! Y  >  1)  =  P(W  <  y I w  >  1)  for  all  y  > 1. 


Observe  that  all  the  assumptions  in  Theorem  2.1  of  Prakasa  Rao  (1974) 


are  satisfied  with  xq  =  yQ  =  0.  Moreover  (X,Y)  and  (Z,W)  satisfy 
Prakasa  Rao's  stipulation  (2.0).  However,  in  this  case,  the  conclu¬ 
sions  of  the  theorem  are  not  valid.  (It  is  obviously  possible  to 
illustrate  this  point  by  other  examples  of  a  similar  nature.) 


Remark  10. 

In  view  of  Theorem  2,  characterizations  based  on  e.r.l.  functions 
are  now  obvious  for  the  well  known  distributions  such  as  the  Marshal  1- 
Olkin  bivariate  distribution,  the  Farl  ie-Gumbel -Morgenst.ern  distri¬ 
bution  discussed  in  Johnson  and  Kotz  (1975b),  Gumbel's  bivariate  ex¬ 
ponential  distribution,  the  multivariate  Pareto  distribution  and  several 
other  multivariate  distributions  appearing  in  Johnson  and  Kotz  (1972). 
One  could  also  apply  the  theorem  to  arrive  at  further  characterizations 
based  on  conditional  expectations  for  distributions  such  as  Frechet's 
and  those  discussed  by  Krishnaiah  (1977).  The  following  example  may 
serve  as  an  illustration  of  this  point. 

EXAMPLE  2.  (Freshet's  bivariate  continuous  distribution). 

2 

Consider  F  to  be  the  continuous  d.f.  on  R  such  that  the  corres¬ 
ponding  survivor  function  is  given  by 

_  2 
F(x-j,x2)  =  min{l-F1  (x1 ) ,  1-F2(x2)},  (x1  ,x2)  e  R 

with  F.|  and  F2  as  univariate  d.f.'s.  Clearly,  since  F  is  assumed  to 
be  continuous,  we  require  F^  and  F2  to  be  continuous  here  also.  Define 

ht(,i>  ■  (Fi(xi)>  '•  xi  6  r1'  1  -  '-2' 
where  0  <  a.  <  *  and  fixed.  Then  it  follows  that  if  X  =  (X^,X2)  is  a 

2 

random  vector  with  d.f.  F,  we  have  for  every  x(  =  (x.,x2))  e  R  and  i  -  1  , 
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E  {h. (X. ) | X  >  x) 

t  i  1+cti  -1 

J~  n-(G(x))  Hl-G(x)}  if  G(x)  <  1 

=  <  ^ 

1  if  G(x)  =  1, 

w  - 

where  G(x)  =  max{ F1 (x^ ) ,  ( x^ ) } .  (On  the  set  {G(x)  =  1),  one  could 

also  define  E{h.(X^)|X  >_  x)  di fferently . )  Obviously,  given  and  F.  , 

{E(hi  (X.| )  |  X-j  >_  x.|  ,  X2  >  x^)  ,  E(h2(X2)|X2  >  x^)  :  (x^  .x^)  =  R2}  character¬ 
izes  the  distribution  considered  above  among  all  bivariate  distributions. 
(This  distribution  has  several  other  interesting  characterization  proper¬ 
ties  also,  the  recent  characterization  based  on  discretized  Shannon  en¬ 
tropy  given  in  Bertoluzza  and  Forte  (1985)  being  one  of  these.) 

3.  EXTENDED  VERSIONS  OF  THE  RESULTS  OF  BASU  AND  PURI 
AND  RUBIN  DEALING  WITH  THE  HAZARD  FUNCTION 

We  shall  now  discuss  a  rather  substantial  generalization  of  what 
is  known  in  the  literature  as  the  "scalar"  multivariate  hazard  function. 

Let,  as  in  the  previous  section,  F  be  a  d.f.  on  R*3,  X  be  a  p-component 
random  vector  with  this  distribution  and  F  be  the  correspondi ng  survivor 
function.  Denote  by  Pp  the  measure  determined  by  F  on  (the  Borel  c-field 
Bp  of)  R^.  Since,  in  the  multivariate  case,  we  can  have  an  F  such  that 
Pp{x:  F( x )  =  0}  >  0,  (e.g.,  if  we  take  F  to  be  continuous  such  that 
Ppf x(  =  (x1  , . .  .  ,xn) )  :  x^  =  - x  2 1  =  1,  we  obtain  Pp(x:  F(x)  =  0}  =  1), 
the  definition  of  a  hazard  measure  in  KSh  (1980)  is  not  extendable  as 
it  stands.  However,  if  we  restrict  ourselves  only  to  the  set  C  (say) 
of  distributions  F  for  which  T(  • )  >  0  almost  surely  [Pp],  the  definition 


in  KSh  (1980)  of  a  hazard  measure  admits  an  obvious  extension.  Suppose 
then  that  F  e  C  and  define  Vp  to  be  the  scalar  hazard  measure  on  Rp 
given  by 

r  i 

vc(B)  = - d P _ ( x )  for  all  B  e  B  .  (3.1) 

F  JBF(x)  F  P 

The  integral  on  the  r.h.s.  of  the  equation  can  be  written  following 

the  accepted  convention  in  the  literature  as  -J— -  dF(x). 

F(x) 

In  the  cas..  when  F  is  an  absolutely  continuous  d.f.  with  respect 
to  the  Lebesgue  measure  on  Rp,  Vp  also  possesses  this  property  and  thus 
the  Radon-Ni kodym  derivative  becomes  the  hazard  function,  studied  by 

earlier  authors,  a.e.  on  {x:  7(x)  >0).  It  follows  from  the  investiga¬ 
tions  of  Basu  (1971)  and  Puri  and  Rubin  (1973)  (see  also,  Galambos  and 
Kotz  (1978))  that  the  measure  Vp  does  not  in  general  determine  uniquely 
the  distribution  F.  Consider  then  p p  to  be  the  set  of  d.f.'s  on  Rp 
that  are  members  of  C  having  the  same  scalar  hazard  measure  as  F.  Clear¬ 
ly  the  set  Pp  defined  herein  is  convex  although  not  necessarily  closed 
relative  to  weak  convergence. 

Consider  now  the  set  of  all  d.f.'s  on  the  compactified  Euclid¬ 
ean  space  [-<*,°°]P.  There  exists  a  normed  linear  space  of  which  this 
is  a  compact  subset  with  the  corresponding  relative  metric  as  a  metric 
of  weak  convergence.  Then,  as  a  further  subset  of  this  compact 
set,  the  closure  of  the  set  Pp  is  also  compact.  (For  simplicity 
we  abuse  the  notation  slightly  here  and  elsewhere  in  this  section  by 
denoting  the  set  of  all  d.f.'s  on  which  are  extensions  of 

members  of  Pp  also  by  Pp.)  Since  Pp  is  also  convex,  Choquet's  theorem 


(cf.  Phelps  (1965)  page  19  and  also  Kendall  (1963))  implies  that  each 
★ 

F  e  Pp  can  be  represented  as  the  centroid  or  barycenter  of  a  probability 
measure  on  the  Borel  a-field  of  the  linear  space  which  is  concentrated  on  the 
set  of  extreme  points  of  Pp.  In  general,  the  problem  of  obtaining  the 
extreme  points  of  Pp  or  merely  of  pp  seems  to  be  a  difficult  one  and 
we  have  not  as  yet  obtained  any  positive  information  in  this  connection. 
However,  through  a  theorem  and  two  corollaries  to  follow,  we  shall 
provide  some  valuable  information  concerning  the  problem  of  characteriz¬ 
ing  F  on  the  basis  of  Vp.  This  gives,  among  other  things,  the  Poisson- 
Martin  representation  for  F  in  terms  of  Vp  when  F  is  continuous  and  a 
more  natural  extension  of  the  univariate  hazard  measure  to  the  multi¬ 
variate  case  than  the  hazard  gradient  of  the  last  section,  possessing 
the  uniqueness  and  stability  requirements. 

Before  discussing  our  main  results  of  this  section,  the  following 
instructive  examples  making  some  specific  points  are  worth  revealing. 

EXAMPLE  3.  Let 

p  D 

F( x )  =  n  F  (x  .),  x  =  (x  ,...,x  )  e  fr, 
i  =1  1  p 

where  F.  are  continuous  d.f.'s  on  .  Then,  appealing  to  the  result 
of  Puri  and  Rubin  (1973)  or  our  observation  above  concerning  a  repre¬ 
sentation  for  the  members  of  l)p,  we  can  easily  see  that  each  member 

★ 

F  e  Pp  has  the  following  form: 

F*(x)  =  f  K  n-(l-F.(x  ))  '')dG(x),  x  e  Rp,  (3.2) 

-  >  .=i  -  - 

where  G  is  a  d.f.  on  Rp  such  that  the  corresponding  measure  is  concen- 


v  '  -  vv  v  v  ..vvvx/jt/vv.v.vv  • 
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P 

trated  on  the  set  x.  >0,1  =  l,2,...,p,  n  >  .  =  1}.  Also,  this 

i  =1  1 

can  be  seen  via  the  Poisson-Martin  integral  representation  given  for 

the  members  of  Dp  in  Corollary  1  below.  Incidentally,  in  the  present 

★ 

case,  the  extreme  points  of  t>p  are  given  precisely  by  the  d.f.'s  F  of 
the  form 


*  P  X  • 

F  (x)  =  n  {1  -  (  F - ( x . ) )  1},  x  e  Rp 
i  =1  11 

P 

with  X.  >  0,  i  =  l,2,...,p  and  n  X.  =  1  and  any  extreme  point  of  Vr 

1=1  1 

is  either  an  extreme  point  of  pp  or  a  d.f.  on  [’•oo,oor  which  is  the 

weak  limit  of  a  sequence  of  extreme  points  of  pp.  Looking  at  an  arbi- 
★ 

trary  member  F  given  by  (3.2)  in  the  case  of  p  >  2  for  Pp,  we  observe 

★ 

a  curious  property  of  Pp  that  if  F  e  pp  and  any  p-1  of  the  p  univariate 


marginals  of  F  agree  with  those  corresponding  to  F,  then  F  =  F.  In 
other  words,  we  have  in  this  case  that  if  a  d.f.  on  Rp  has  p-1  of  its 
univariate  marginals  precisely  the  same  as  those  corresponding  to  F  and 
its  scalar  hazard  measure  on  Rp  is  defined  and  is  given  by  Vp,  then 
this  d.f.  has  to  be  F.  Since  every  univariate  d.f.  is  uniquely  determin¬ 
ed  by  its  hazard  measure,  we  could  also  restate  this  property  using 
only  hazard  measures.  (For  some  recent  advances  connected  with  the  re¬ 
sults  discussed  herein,  see  Lau  and  Rao  (1982),  Rao  and  Shanbhag  (1986) 
and  Davies  and  Shanbhag  (1987),) 


EXAMPLE  4.  Let  p  >  2,  k  be  a  real  number  and  S  be  a  countable  subset 

of  Rp~^  .  Also  let  £  denote  the  set  of  d.f.'s  on  Rp  ^  that  are  concentrated 

on  S  giving  a  positive  probability  mass  to  each  point  of  S.  For  each 

G  e  C,  let  F„  denote  the  d.f.  on  Rp  which  is  concentrated  on 
G 

p 

)  X 


lx:  x  e  RP , 


=  k)  with 


(in  the  usual  notation).  It  is  easily  seen  that  here  Vp  are  all  (well 

defined  and)  identical.  If  we  now  consider  p  >  4  and  any  of  the  F  's 

~  b 

★ 

to  be  F,  then  it  is  clearly  seen  that  the  condition  that  F  e  Up  does 
*  * 

not  imply  F  =  F  even  if  it  is  given  that  F  has  all  of  its  univariate 
marginals  or  bivariate  marginals  to  be  the  same  as  those  of  F.  However, 


for  the  F  in  this  example,  the  condition  that  F  e  Up  together  with 


F  ( *i  » .  • .  >x  i  ,°° )  ~  Fr(  x i 


’Xp-1  >°°)  ♦  (xi . xp_i )  e  R 


p-1 


implies  that  F  =  F.  Note  also  that  here  we  have  the  set  of  extreme 

points  of  Up  to  be  empty  and  the  set  of  extreme  points  of  Up  to  be  the 
closure  (relative  to  weak  convergence)  of  the  set  of  the  degenerate 

d.f.'s  on  [ -oo ,oo ] P  that  are  concentrated  on  {x:  x  e  Rp,  ?x.  =  k);  clearly 

*  -  *  1  1 

now  the  situation  of  the  last  example  that  each  F  e  Ur  has  an  integral 
representation  in  terms  of  the  extreme  points  of  Up  is  not  valid. 

In  spite  of  certain  isolated  cases,  such  as  that  of  Frechet's  dis¬ 
tribution  of  Example  2  or  of  a  d.f.  F  that  satisfies  for  some  b  e  Rp 
the  conditions  F(b)  =  1  and  F(b)  =  P p ( { b} )  >  0,  in  which  the  F  is  char¬ 
acterized  by  Vp,  it  now  follows  that,  in  general,  unless  at  least  one  of 
the  ( p-1 ) -variate  marginals  of  the  distribution  (or  something  equivalent 
to  it)  is  given,  Vp  does  not  characterize  F.  One  might  then  be  interested 
to  know  whether  F  is  characterized  by  Vp  given  any  one  of  the  ( p-1 )-vari ate 
marginals.  Our  attempt  to  answer  this  question  has  been  only  partially 
successful  so  far  and  the  findings  of  this  investigation  are  presented, 
among  other  things,  in  the  following  results. 


We  are  now  ready  to  give  our  main  theorem  of  the  section  together 
with  the  two  of  its  interesting  corollaries.  (The  reader  can  find  some 
analogy  between  the  proof  of  the  theorem  given  here  and  Seneta's  (1981) 
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proof  of  the  Poisson-Marti n  integral  representation  theorem  for  a  super 
regular  vector  corresponding  to  a  non-negative  matrix.) 

THEOREM  3.  If  F  e  and,  for  each  i  =  l,2,...,p,  we  have  (in  the 
standard  notation) 


F  (x-j,...,x_.  i*00,  xi  >  •  ■  *  *^n)  F(x-, . x.  i  ,  x^x-|,.,.,x^) 


i+1 


i-1 


'i+1 


for  all  x.  e  R  ,  i  =  1  ,2 , . . .  ,i-l  ,i +1  , . . .  ,p ,  (3.3) 


then  F  =  F.  Furthermore,  given  an  F  e  Op,  there  exists  a  probability 


measure  u  on  the  set  of  all  d.f.'s,  G  on  ,  such  that 


F  (x)  = 


G(x)du  (G) ,  x  e 


(3.4) 


where  y  [M)  =  1  and  /(  is  the  closure  (relative  weak  convergence)  of  the 
set  of  the  d.f.'s  K^*)  for  t  such  that  F(t),  F(t)  >  0  (F  being  the  sur¬ 
vivor  function  of  F  as  in  the  last  section),  where  each  of  the  Kt(*)  is 
defined  to  be  a  d.f.  on  [-<*>,®]p  such  that  it  is  the  degenerate  d.f.  at 
t  if  Pp( { t > )  =  F(t)  and  the  d.f.  satisfying  the  following  otherwise: 


k(x,t) 

Kt^  =  k( t  ,t )  *  !  e 


wi  th 


CO 

l 

r 

f  -f 

n  =  l  • 

(-,o.y)J 

l y i  »t ] 

dv 


lVr!] 


(3.5) 


F(yp) .  .  .d'j(r(y1 ) ,  y  e  (  -»,t 


A^.(*)  being  the  d.f.  degenerate  at  t.  (The  proof  of  the  theorem  asserts 
that  K  (•)  is  well  defined.) 


Proof.  In  view  of  Fubini's  Theorem  and  relation  Pp(B)  = 


F(  x ) d\>p(  x  ) 


m 
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for  every  Borel  subset  B  of  Rp,  we  have  for  each  t  such  that 
F(t)  >  Pp(  { t } ) ,  n  >  1  , 


f  -f 

■  (-“.t]‘ 

( -^  2  ] 


d  v  p  ( y  -j )  •  •  -dvF(yn) 


r 

^  [y-i  »t]  “■ 

dvF(yn)...dvF(y,! 


1 


F(t) 

at 

F(t) 

n-1 

at 

Fit) 


i 

I 

(  — [y-j  .t]  J  tyn_-|  .t] 

r 

I 

(-:,t]Jlyrt]  [yn_2 ,  t  ] 


F(yn)dvF(yn) . . .dvF(y1 ) 


F(vi)dVvi)--'dvF|y 


n-l 


F(y  -  )dvF(y  )  =  — — 
l-,t]  F(t) 


Fit), 


13.6) 


where  at  =  F(t)/{F(t)  +  F(t)  -  Pp( { t } ) }  <  1,  since  at  >  Pp( [y  t])/F(y) 
for  y  _<  t.  (3.6)  establishes,  among  other  things,  that  K^l*)  in  the  state- 

D 

ment  of  the  theorem  is  well  defined.  Now,  for  each  d.f.  F  on  RK  such 
★ 

that  F  e  ^  and  t  as  in  (3.6),  we  have,  in  view  of  relation  P  *(B)  = 
f  _*  F 

F  (v)dvp(x)  with  B  as  an  arbitrary  Borel  set  and  F  as  the  survivor 

8 

function  corresponding  to  F  , 


r  (t)  =  e  (t)  +  (-dpp  *((-,t)) 
O'  F  '  ' 


*Q(t)  +  (-D1 


( ,t ) 


F  ( x )dv p( x  ) 


=  e, (t)  +  i-i ) 


2p 


P  *( I-”,* ) )dv  (x) 

(-»,t)  F 


>V 


S  (t)  +  (-!)'"  ...  P  *U-,x ,)) 


dvF(x^)...dvF(xn),  n  >_  1 ,  (3.7) 


where  the  sequence  (t):  m  =  0,1,...}(for  each  given  t)  is  such  that 


it  depends  only  on  vF  and  d.f.'s  F  ( x] , . . . ,x._ ] ,« ,x. +] , . . .  ,x  ) , 
(^1»«»»»Xj_^x._j_-j,...,x  )  €  RP  ,  i  =  l,...,p.  It  follows  trivially 
from  (3.6)  that  the  multiple  integral  on  the  r.h.s.  of  (3.7)  tends  to 


zero  and  n  -*■  ®.  This  in  turn  implies  that  the  sequence  {£  (t):  n  =  1,2, 


in  (3.7)  converges  to  F  (t)  and  hence  we  have  that  if  (3.3)  is  valid, 


F  (t)  =  F(t)  for  each  t  such  that  F(t)  >  Pp( { t> ) .  (3.8) 

_  _ ★  p. 

In  view  of  the  left  continuity  of  F  and  F  and  the  fact  that  {x:  x  €  R  , 


F(x)  =  0}  =  {x:  x  e  Rp,  F  (x)  =  0} ,  we  can  conclude  that  if  (3.8)  is 


valid,  then  we  have  F  =  F  or  equivalently  F  =  F.  This  establishes  the 


first  part  of  the  theorem. 


To  establish  the  second  part  of  the  theorem,  define 


8  =  { t :  t  g  RP ,  F(t),  F(t)  >  0), 


8q  =  { t :  t  €  RP ,  F(t)  =  Pp( { t) )  >  0)  , 


Bm  =  ( t:  t  €  B,  F(t)  _>  P  p  ( { t } )  +  m  =  1  ,2,...  . 


If  F  €  Oj.,  then  by  the  monotone  convergence  theorem,  we  get 


F  (x)  =  F  (y)dv  (y) 

'  .[  x ,°° )  '  F~ 


Kj.(  x)d?F+(t)  +  lim  ;  A  (x)F  (t)d\>F(t),  x  e  Rf 

B  _  ~  m^°  >  B  _  " 

o  m 


(3.9) 


V.V.  /.  i. 
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Now,  for  every  m  >_  1  and  t  e  B  ,  of  (3.6)  is  bounded  by  m/(m+1) 
and  hence  it  follows  from  (3.6)  that  k(t,t)  is  bounded  on  Bm  for  each 
m  >  1.  Also,  if  we  define  Tc(x,t)  =  k(t,t)  -  k(x-,t)  for  each  t  €  B  0 
and  x  e  Rp  (with  this  to  be  zero  if  x  £  (-°°,t]).  Fubini's  Theorem 
implies  that  for  each  m  >  1  and  x  e  Rp 


TT(x ,t ) F  (t)dvp(t) 


A  .(x)T  (t)d-Op(t)  +  Tc(x,t)P  *(B  ( t) )  dv  ( t) ,. 

t  ~  -  '  -  j  g  ~  -  p  111  -  r  - 


(3.10) 


where  B  (t)  =  [t,«)  n  B  .  Observe  that  (3.10)  follows  easily  from  the 
m  „  _  m 

relations : 


(x,t)  =  A  (x)  +  l 

n*l  x. 


tFJx.yJ  J  [  x  ,y2  ] 


dv|_(y1 ) . .  .dvF(yn) , 


X  <  t,  t  €  B  o  BQ, 


B  (y  ) 
m  in 


F*(t)dVp(t)  =  Pp*(Bm(yn)),  yn  €  B  n  Bp,  m,n  >  1 


From  (3.9)  and  (3.10),  it  consequently  follows  that  there  exists  a 

sequence  lg  :  m  =  1,2,...}  of  measures  on  Rp  such  that  g  (RP)  1  1  f°r 
m  111 

all  m  and 


F  (x)  =  lim  I  K( x ,t) dg  ( t) ,  x  €  R  , 
m^»  JB  ~  ~  ~ 


(3.11) 


which  in  turn  implies  that  { u m ( B ) }  converges  to  1  and  hence  that  there 
exists  a  sequence  (g  )  of  probability  measures  on  Rp  for  which  (3.11)  is 
valid.  Since  K  is  compact,  using  Parthasarathy ' s  (1967)  Theorem  6.4, 


it  can  then  be  easily  seen  that  there  exists  a  probability  measure  u 
on  K  such  that 

F*(x)  =  G(x)dy*(G),  x  e  RP. 

“  IK  ~  ~ 

Since  V p  is  the  closure  of  Op,  a  further  application  of  Parthasarathy's 
Theorem  yields  the  validity  of  the  second  part  of  our  theorem. 

The  following  two  corollaries  of  Theorem  3  are  easy  to  prove: 

COROLLARY  1.  (The  Poisson-Martin  representation):  If  F  is  continuous, 

*  D 

then  we  have  a  d.f.  F  on  R  to  be  a  member  of  Op  if  and  only  if  it  has 
a  representation 

*  f  n 

F  (x)  =  G(x)du(G),  x  e  Rp, 

]K*  F  ' 

for  some  probability  measure  y  on  K  n  Op.  (In  the  present  case,  we  also 
have  k  0  Op  to  be  a  G6  set  of  the  space  of  all  d.f.'s  on  [-c°,=°Jp.) 

COROLLARY  2.  The  hazard  measure  Vp  jointly  with  the  hazard  measures 
relative  to  all  the  univariate  and  multivariate  marginals  of  F  determines 
F  uni que 1y .  (This  corollary  can  be  verified  by  induction.) 

Remark  1 1  . 

It  can  be  noted  that  the  result  of  Corollary  1  does  not  remain  valid 

if  the  assumption  that  F  is  continuous  is  dropped.  Also,  in  view  of  what 

we  have  observed,  it  can  be  concluded  that  if  F  is  continuous,  then  we 

have  the  set  of  extreme  points  of  Op  to  be  a  nonempty  subset  of  K  and 
★ 

each  F  €  Pp  to  be  the  barycenter  of  a  probability  measure  that 
is  carried  by  the  set  of  extreme  points  of 
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Remark  12. 

The  finite  collection  of  hazard  measures  given  in  Corollary  2 
appears,  in  spite  of  the  restriction  that  F  e  C,  to  be  a  more  natural 
multivariate  analogue  of  the  univariate  hazard  measure  than  the  hazard 
gradient  of  the  last  section.  A  stability  theorem  for  the  collection 
is  valid  when  F  is  continuous,  as  is  shown  by  Corollary  3  of  the  next 
section. 


4.  A  STABILITY  THEOREM 


•  1 


k’*-  ' 

fa 


We  conclude  the  paper  by  proving  and  commenting  in  this  section  on 
a  general  stability  theorem  for  probability  measures  on  metric  spaces, 
which  yields,  among  other  things,  the  two  stability  propositions  in 
KSh  (1980)  as  simple  corollaries.  The  proof  of  the  present  theorem  uses 
Prohorov's  (1956)  and  related  theorems  in  Billingsley  (1968)  dealing 
with  the  convergence  of  probability  measures.  It  might  be  instructive 
to  compare  this  with  the  proofs  of  earlier  stability  propositions  in  KSh 
(1980).  The  techniques  used  for  proving  the  theorem  here  are  indeed  of 

a  more  global  nature  than  those  which  are  sufficient  in  the  case  of 
probability  measures  on  the  real  line. 

Now,  let  S  be  a  metric  space,  T  an  index  set,  S  the  Borel  o-field 
on  S,  p,  Py  families  of  probability  measures  on  (S  ,$) ,  lA^.:  t  €  T} 

a  family  of  collections  of  sets  wi  th  At  CS  for  every  t  €  T,  and 
{h(« ( t ,At ,P ) :  A  e  At.  P  e  P,  t  e  T  }  a  family  of  real -valued  Borel 
measurable  functions  on  (S,$)  satisfying  the  following  conditions  in 


which  the  notation  D(t,At,P)  stands  for  the  set  of  discontinuity  points 


of  h(« |t,At,P), 


bfaZ'jmA 
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(i)  P-j ,  p^,  p^  c  P.  also  f>2  is  closed  (under  weak  convergence), 

(ii)  P^,  P^,  P^  »  •••  €  P |  ancl  (Pp^:  n  _>  1 }  converges  weakly 
to  P  s  P  s  h (•  |  t,A^,P^  ')  -*  h(*  1  t.A^.P  )  as  n  -*■  °°  uni  formly  almost 
surely  [P  ]  on  Afc  n  Dc(t,At>P  )  and 

n>l  EPp^h^‘  lt,At*Pn  ^^IAin{|h(*|t,At,Pp1^|  >_  a)}  ^  0 

*  * 

as  a  +  ”  for  each  t  e  T  and  P  -continuity  set  At  with  P  (A^)  positive, 

(2)  (2) 

(iii)  P|  ,  P;,  ;  e  V ^  and  are  distinct  =»  there  exist  t  e  T  and  At  € 

such  that  P^(At),  P^(At)  are  both  positive,  A^  is  both  pj  2^-continui  ty 
(2) 

set  and  P^  -continuity  set  and 

E  (2){h(-|t,At,P^2))|At}  t  E  (2)(h(.|t,At,p£2))|At}, 

P1  P2 


(iv)  P^  €  »  D(t,At,P^^)  has  zero  P^-measure  for  every  t  in  T 


(3) 

and  Pv  -continuity  set  A^  in 


Further,  let  P  e  P  and  { P^ :  n  >.  1}  be  a  sequence  of  members  of  P 1 
such  that  { Pn :  n  =  1,2,...}  is  relatively  compact.  Then  we  have  the 
following  stability  theorem: 


THEOREM  4.  (a)  The  condition  that 


P  €  P^,  {Pp:  n  >_  1 }  converges  weakly  to  P  (4.1) 


implies  that 


Ep  {hC- 1 1 ,At,Pn) | At)  E  {h(- 1 t,At,P) |At) 


(4.2) 


as  n  -*•  «  for  every  t  €  T  and  P-continuity  set  A^  6  with  P(A^)  >  0. 
Moreover,  (b)  if  additionally  P,  P-j ,  P2 ,  ...  e  P2  and  the  set  of  cluster 
points  of  {P^:  n  =  1,2,...}  (relative  to  weak  convergence)  is  a  subset 


of  P^,  then  the  converse  assertion  is  valid. 


Proof.  Assume  first  that  (4.1)  is  valid.  Since  P  e  P^,  it  1S  obvious 


that  the  set  of  discontinuity  points  of  h ( • 1 1 , A  ,P)I.  has  zero  P-measure 

t  At 


for  every  t  €  T  and  P-continuity  set  A^.  €  A^.  Now,  let  t  €  T  and  P- 


continuity  set  A^.  e  A^  be  arbitrarily  fixed.  Since  Pn  €  ,  n  >_  1 ,  the 

requirements  of  Billingsley's  (1  968)  Theorem  5.5  are  clearly  met  with 


h(* j t,At,P) 1^  as  h  and  h(* 1 t ,A t , P^ ) I ^  as  hn>  This  theorem  implies 

-1  t  t  _1 

1  —  1  O  1  __  _ _ _  ti  ■  r>  i  I 


that  {P  h 


n  n 


1,2,...}  converges  weakly  to  Ph 


If  we  now  consider 


X  ,  n  >  1  and  X  to  be  some  random  variables  having  distributions  P  h 
n  —  3  n  n 


-1 


■1 


n  >  1  and  Ph  respectively,  we  have  { X  :  n  =  1,2,...)  converging  to  X 


in  distribution.  Also,  the  fact  that  Pn  €  P^ ,  n  _>  1  implies  that 


{ X^ :  n  =  1,2,...}  considered  here  is  uniformly  integrable.  Since 


Billingsley's  Cl 968)  Theorem  5.4  yields  that  E { Xn }  E { X }  as  n  -*■  °°  in 


such  a  situation,  we  can  conclude  that 


Ep  (h(-|t,At,Pn)IA  }  -  Ep{h(.|t,At,P)IA  }  as  n  -*■  ®.  (4.3) 

n  t  t 

In  view  of  the  assumptions  that  { Pn }  converges  weakly  to  P  and  At  is  a 

P-continuity  set,  it  follows  that  Pp(^t)  P(A^)  as  n  ->  ®.  If  P(A^)  >  0, 

we  have  (4.2)  then  as  an  obvious  consequence  of  (4.3).  Hence  we  have  the 

first  part  of  the  stability  theorem  to  be  valid. 

To  establish  that  the  second  part  of  the  theorem  holds,  assume  that 

P,  P-|»  ?2’  •••  €  ^2  an<^  the  Set  c1uster  Points  of  (Pn:  n  =  1,2,...} 

is  a  subset  of  P^  and  also  that  (4.2)  is  valid.  Since  each  cluster  point 

of  {P  :  n  =  1,2,,..}  is  an  element  of  P0  and  {P  :  n  =  1,2,...}  is  rela- 
n  3  n 

tively  compact,  we  should  have  a  subsequence  {P  :  r  =  1,2,...}  of 

r 

{Pn:  n  =  1,2,...}  converging  weakly  to  Q  e  W1th  Q  ^  P  unless  (4.1)  is 


valid.  If  Q  denotes  the  (weak)  limit  of  a  subsequence  of  (Pn), 

★ 

then  clearly  we  have  Q  e  and  hence  the  first  part  of  the  theorem  and 
the  validity  of  (4.2)  lead  us  to 


Ep{h(. ! t,At,P) | At>  =  E  *{h(.|t,At,Q  )|At) 


(4.4) 


for  every  t  e  T  and  A  €  such  that  At  is  a  P-continuity  set  with 

★  ★ 

P(A^)  >  0  as  well  as  a  Q  -continuity  set  with  Q  (A  )  >  0.  We  have  assumed 

that  P  e  V„  and  for  each  n  >  1,  P  e  ?0  and  also  we  have  P„  to  be  closed. 

2  —  n  2  2 

★  ★ 

In  that  case,  we  have  P,Q  e  and  hence,  in  view  of  (4.4),  Q  =  P.  It 

is  therefore  impossible  that  (4.1)  will  not  be  valid.  Hence  we  have  the 

second  part  of  the  theorem. 


Remark  13. 

In  the  case  of  h(*jt,At,P)  being  independent  of  P,  obviously  the 
part  of  condition  (ii)  that  h(«  1 1  ,A^,P^  b  -*■  h  ( •  1 1 ,  A^ ,  P  )  uniformly 

★  c  *  * 

almost  surely  [  P  ]  on  A^  0  D  (t,A^,P  )  for  every  t  e  T  and  P  -continuity 
★ 

set  At  with  P  (Afc)  >  0  is  trivially  met.  Also,  if  h(  •  1 1  ,A  t  ,P )  are  all 
continuous,  then  the  condition  (iv)  above  is  obviously  satisfied  with 
P^  =  P.  If  S  is  a  Polish  space  or  in  particular,  if  it  is  a  Euclidean 
space,  we  have  a  sequence  (P  :  n  =  1,2,...}  of  members  of  P  to  be 
relatively  compact  if  and  only  if  it  is  tight  in  the  sense  of  Billingsley 
(1968:  p . 37 )  (cf.  Theorems  6.1  and  6.2  in  Billingsley  (1968)).  Thus,  it 
is  evident  that  in  various  specialized  situations,  the  theorem  given 
above  has  simplified  and  perhaps  more  appealing  versions. 


Remark  14. 

If  the  stipulation  "the  set  of  cluster  points  of  {Pn:  n  =  1,2,... 
is  a  subset  of  P^"  is  replaced  by  the  weaker  stipulation  "the  set  of 
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cluster  points  of  the  range  of  {Pn:  n  =  1,2,...}  is  a  subset  of  P  ", 
Theorem  4  still  remains  valid  provided  we  also  replace  "the  converse 
assertion  is  valid"  by  "(4.2)  implies  that  {Pn:  n  =  1,2,...}  converges 
weakly  to  P" . 


Remark  15. 

To  illustrate  that  the  stability  theorem  just  proved  does  not  remain 
valid  if  the  assumptions  P  e  and  the  set  of  cluster  points  of 
(Pn:  n  =  1,2,...}  is  a  subset  of  P3  respectively  appearing  in  the  two 
parts  of  the  theorem  are  omitted,  it  is  sufficient  to  consider  the 
following  example: 


EXAMPLE  5.  Let  {xn:  n  =  1,2,...}  be  a  sequence  of  strictly  increasing, 
real  numbers  converging  to  a  real  number  x’.  Let  x'1  be  a  real  number 
greater  than  x'.  Define  P,  P',  { Pn :  n  =  1,2,...}  to  be  a  sequence  of 
probability  measures  on  the  Borel  o-field  of  R^  such  that  for  some 


0  <  a  <  1 


Pn((x}) 


P({x}) 


P'({x})  = 


a  if  x  e  x 

n 

1  -  a  if  x  =  x", 


a  if  x  =  x1 


1  -  a  if  X  =  x" 


a  +  ~~t~t  4  i  f  x  =  x 1 

X  -X 


1  „  a(d-c)  if  _  ,, 

*  “  C1  ~  y  I  »  _  ^  »  I  T  X  X  * 


where  c  and  d  are  given  real  numbers  such  that  c  <  d  and  fa( d-c ) / ( x  ’  1  -x ' ) } 
<  1  -  a.  Also,  define  h  on  R  such  that 


h  ( x )  = 


c  i  f  x  <  x  ' 


d  +  (x-x ' )  i f  x  >  x ' 


If  we  take  T  =  the  singleton  {!},  =  r(-°»,x):  -  »  <  x  <  x’1-, 

P  =  f P  ,P 1  » P i  > •  •  • 1  and  h(*|l,A,P  )  =  h(-)  for  every  member  A  of 

★ 

and  P  e  P,  then  it  follows  that  P  itself  satisfies  the  requirement 

of  and  P^  mentioned  above.  However,  in  this  case  we  cannot  have 

a  nonempty  subset  of  P  satisfying  the  condition  (iv)  as  required. 

Consequently,  it  follows  that  in  this  example  neither  the  requirement 

of  P  e  P3  nor  the  requirement  of  the  set  of  cluster  points  of 

{Pn:  n  =  1,2,...}  being  a  subset  of  P^  is  met.  Observe  that  here 

(Pn:  n  =  1,2,...}  converges  to  P  weakly,  P  f  P1  and  (4.2)  is  not  valid 

(since  E  { h  ( - )  |  A}  ■/  E  { h  ( - )  |  A}  whenever  A  =  (-»,x)  with  x  <  x')  but 
n  p 

(4.2)  with  P  replaced  by  P'  is  valid.  This  implies  that  with  the  de¬ 
letions  mentioned  above  neither  the  first  part  of  the  theorem  nor  the 
second  part  remains  valid. 

Theorem  4  has  several  interesting  corollaries.  In  particular  it 
yields  that  if  a  characteristic  property  exists,  based  on  conditional 
expectations  of  the  type  E  {h(*|t)j^^}  for  probability  measures  P  within 
a  certain  class,  then,  under  certain  mild  conditions,  one  can  produce 
a  stability  version  of  the  property.  It  is  easily  seen  that  Proposi¬ 
tion  4  of  KSh  (1980)  is  an  obvious  corollary  of  Theorem  4  and  also  it 
is  not  difficult  now  to  state  a  stability  version  of  our  Theorem  2  of 
Section  2  based  on  Theorem  4.  (Note  that  in  view  of  what  was  revealed 
in  Remark  13,  the  statement  of  Theorem  4  simplifies  under  the  situation 
in  Theorem  2.)  It  is  also  worth  pointing  out  in  this  place  that  (in  view 
of  Proposition  5  of  KSh  (1980))  the  "only  if"  part  of  Proposition  8  of 
KSh  (1980)  follows  as  a  corollary  of  the  first  part  of  Theorem  4  by 


30 


letting  S  -  r\  T  -  (-«>,b) ,  ^  }  for  every  t  e  (-<=°,b) ,  P  =  =  the 

set  of  measures  in  the  sequence 

{P_  :  n  >  0,  F  =  F},  (*>-=)?,  =  {PF> 

n  “  0  J  h 

★ 

and  for  each  t  in  T  and  P  in  P 

f(P*(  tx.“))'1I(_00>t]  if  P  ^x*”))  >  0 

h(x|t,At,P*)  *  < 

0  otherwise; 

moreover,  if  some  simple  initial  observations  are  made  and  V ,  ,  P ^ 

and  P3  are  appropriately  redefined,  the  "if"  part  of  Proposition  8  of 
KSh  (1980)  follows  from  the  second  part  of  Theorem  4.  Essentially  the 
same  argument  leads  to  the  following  stability  version  of  the  charac¬ 
terization  result  in  our  Corollary  2  of  Section  3.  This  result  clearly 
subsumes  Proposition  8  of  KSh  (1980). 


COROLLARY  3.  Let  p  1  and  { Fn :  n  =  1,2,...}  be  a  sequence  of  d.f.'s 

on  Rp  and  F  be  a  continuous  d.f.  on  Rp.  Assume  that  F  and  for  each  n, 

F  are  members  of  the  set  C  defined  in  the  last  section.  Then 
n 

Fn ( x )  -*■  F(x)  for  all  x  e  Rp 

if  and  only  if 

★  *  _ 
vr  (x)  -*■  vc(x)  for  all  x  with  F(x)  >  0, 

~ 'n  ~  ~  r  ~  -  - 

★ 

where  the  notation  v„(x)  stands  for  the  vector  whose  elements  (qiven 

-O' 

in  some  specified  order)  are  v_((-«,x])  and  its  counterparts  relative 
to  all  the  univariate  and  multivariate  marginals  of  G,  with  appropriate 
subvectors  in  place  of  x  and  appropriate  number  of  components  in  °°, 
and  F  stands  for  the  survivor  function  corresponding  to  F  as  in  the 


earlier  sections 
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